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Abstract 

The popular criteria of optimality for quickest change detection procedures are the Lorden criterion, 
the Shiryaev-Roberts-Pollak criterion, and the Bayesian criterion. In this paper a robust version of these 
quickest change detection problems is considered when the pre-change and post-change distributions are 
not known exactly but belong to known uncertainty classes of distributions. For uncertainty classes that 
satisfy a specific condition, it is shown that one can identify least favorable distributions (LFDs) from 
the uncertainty classes, such that the detection rule designed for the LFDs is optimal for the robust 
problem in a minimax sense. The condition is similar to that required for the identification of LFDs for 
the robust hypothesis testing problem originally studied by Huber. An upper bound on the delay incurred 
by the robust test is also obtained in the asymptotic setting under the Lorden criterion of optimality. This 
bound quantifies the delay penalty incurred to guarantee robustness. When the LFDs can be identified, 
the proposed test is easier to implement than the CUSUM test based on the Generalized Likelihood Ratio 
(GLR) statistic which is a popular approach for such robust change detection problems. The proposed 
test is also shown to give better performance than the GLR test in simulations for some parameter values. 

Keywords: Quickest change detection, Minimax robustness, Least favorable distributions, CUSUM test, Shiryaev 
test. 



I. Introduction 

The problem of detecting an abrupt change in a system based on observations is a dynamic hypothesis 
testing problem with a rich set of applications. Such problems of change detection were first studied 
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by Page over fifty years ago in the context of quality control 0. In its standard formulation there is a 
sequence of observations whose distribution changes at some unknown point in time, referred to as the 
'change -point'. The goal is to detect this change as soon as possible, subject to a false alarm constraint. 
Some applications of change detection are intrusion detection in computer networks and security systems, 
detecting faults in infrastructure of various kinds, and spectrum monitoring for opportunistic access to 
wireless networks. 

Most of the past work in the area of change detection has been restricted to the setting where the 
distributions of the observations prior to the change and after the change are known exactly (see, e.g., 
0, H, Q, O; for an overview of the work in this area, see Q, JH and [9].). The three most popular 
criteria for optimizing the tradeoff between detection delay and false alarm rate are the Lorden criterion 
flU and the Shiryaev-Roberts-Pollak criterion, in which the change-point is a deterministic quantity, and 
Shiryaev's Bayesian formulation ifTOl . in which the change-point is modeled as a random variable with 
a known prior distribution. In this paper we study all these three versions of change detection, under 
the setting where the pre-change and post-change distributions are not known exactly but belong to 
known uncertainty classes. We pose a minimax robust version of the standard quickest change detection 
problem wherein the objective is to identify the change detection rule that minimizes the maximum delay 
over all possible distributions. This minimization should be performed while meeting the false alarm 
constraint for all possible values of the unknown distributions. We obtain a solution to this problem 
when the uncertainty classes satisfy some specific conditions. Under these conditions we can identify 
Least Favorable Distributions (LFDs) from the uncertainty classes, and the optimal robust change detection 
rule is then the optimal (non-robust) change detection rule for the LFDs. These conditions are similar 
to those given by Huber ifTTI for robust hypothesis testing problems. We also discuss related results on 
robust sequential detection IfTTI |[T2l later in the paper. 

Although there has been some prior work on robust change detection, these approaches are distinctly 
different from ours. The maximin approach of fFH is similar in that they also identify LFDs for the 
robust problem. However, their result is restricted to asymptotic optimality (as the false alarm constraint 
goes to zero) under the Lorden criterion. A similar formulation is also discussed in |[T4l Sec.7.3.1]. Some 
other approaches to this problem (e.g. fT31 , |fT6l ) are aimed at developing algorithms for quickest change 
detection with unknown distributions. These works study the asymptotic performance of the proposed 
tests under different distributions but do not seek to guarantee minimax robustness over a given class of 
distributions. 

A closely related problem is the composite quickest change detection problem. In general, these 



June 2, 2010 



DRAFT 



3 



problems also address the setting where the pre-change and post-change distributions are unknown. 
However, unlike the robust problem, in composite problems one seeks to identify a change detection 
procedure that is simultaneously optimal under all possible values of the unknown distributions. Exact 
solutions to these problems are often intractable and hence most results are restricted to asymptotic 
optimality. One such solution to a composite change detection problem is discussed in Q when only 
the post-change distribution is unknown. In Q a test is given that is asymptotically optimal under the 
Lorden criterion for all possible values of the unknown post-change distribution in a one-dimensional 
exponential family of distributions. This test is also referred to as the Generalized Likelihood Ratio Test 
(GLR Test), and was also studied in ifTTl and lfl"8l . An alternate asymptotically optimal solution for the 
setting in which both pre-change and post-change distributions are unknown was studied in fl9l . 

We provide a performance comparison of our proposed robust test with the GLR test. Although the 
GLR test asymptotically performs as well as the optimal test with known distributions, we show via 
simulations that our robust test can give improved performance over the GLR test for moderate values 
of the false alarm constraint. The GLR test is also often prohibitively complex to implement in practice, 
while the proposed robust CUSUM test admits a simple recursive implementation. 

For the asymptotic version of the problem, we also provide an analytical upper bound on the delay 
incurred by our robust test and use it to provide an upper bound on the drop in performance of our test 
relative to the optimal non-robust test. 

The rest of the paper is organized as follows. We first state the problem that we are studying in Section 
im In Section [III] we describe the robust solution and present some analysis. We discuss some examples 
in Section [IV] and conclude in Section [V] 

II. Problem Statement 

In the online quickest change detection problem we are given observations from a sequence {X n : 
z = 1,2,...} taking values in a set X. There are two known distributions uq, v\ € V(X) where V(X) 
is the set of probability distributions on X . Initially, the observations are drawn i.i.d. under distribution 
uq. Their distribution switches abruptly to v\ at some unknown time A so that X n ~ v$ for n < X — 1 
and X n ~ v\ for n > X. The observations are stochastically independent conditioned on the change- 
point. The objective is to identify the occurrence of change with minimum delay subject to false alarm 
constraints. We use to denote the expectation operator and P u m to denote the probability law when the 
change happens at m and the pre-change and post-change distributions are vq and v\ respectively. The 
symbols are replaced with E^, and PJ^ when the change does not happen. Similarly, if the pre-change 
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and post-change distributions are some fi and 7, respectively, and the change happens at time m, we use 
Em 7 to denote the expectation operator and P™ 7 the probability law. We further use T m to denote the 
sigma algebra generated by (Xi, X 2 , . ■ ■ , X m ). 

A sequential change detection procedure is characterized by a stopping time r with respect to the 
observation sequence. The design of the quickest change detection procedure involves optimizing the 
tradeoff between two performance measures: detection delay and frequency of false alarms. There are 
various standard mathematical formulations for the optimal tradeoff. In the minimax formulation of (H 
the change -point is assumed to be an unknown deterministic quantity. The worst-case detection delay is 
defined as, 

WDD(r) = sup ess sup E u x [(t - A + l) + |J r A -i] 

A>1 

where x + = max(x, 0). This quantity captures the worst-case value of the expected detection delay over 
all possible locations of the change-point and all possible realizations of the pre-change observations. 
The false alarm rate is defined as, 



Here EJ^ [r] can be interpreted as the mean time to false alarm. Under the Lorden criterion, the objective 
is to find the stopping rule that minimizes the worst-case delay subject to an upper bound on the false 
alarm rate: 

Minimize WDD(r) subject to FAR(r) < a (1) 

It was shown by Moustakides Q that the optimal solution to CD is given by the cumulative sum (CUSUM) 
test proposed by Page [2]. We describe this test later in the paper. 

An alternate formulation of the change detection problem was studied by Pollak Q. Even here the 
change point is modeled as a deterministic quantity. But the delay to be minimized is no longer the 
worst-case delay but a worst-case average delay (also referred to as supremum average detection delay 
by some authors) defined by, 

J SRP (r) = sup E v x [t — A|r > A]. 

A>1 

The Shiryaev-Roberts-Pollak criterion of optimality of a stopping rule r for change detection is given 

by, 

Minimize J SRP (r) subject to FAR(t) < a (2) 

where the minimization is over all stopping times r such that J SRP (r) is well-defined. Pollak Q 
established the asymptotic optimality of the Shiryaev-Roberts-Pollak (SRP) stopping rule for §2}. 
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Another approach to change detection is the Bayesian formulation of ifTOll . Here the change-point 
is modeled as a random variable A with prior probability distribution, irk = P(A = k),k = 1,2,.... 
The performance measures are the average detection delay (ADD) and probability of false alarm (PFA) 
defined by: 

ADD(r) = E"[(r - A)+], PFA(r) = P u (t < A) 

where E u represents the expectation operator and P u the probability law when the pre-change and post- 
change distributions are Vq and v\ respectively. For a given a € (0, 1), the optimization problem under 
the Bayesian criterion is: 

Minimize ADD(r) subject to PFA(r) < a (3) 

When the prior distribution on the change -point follows a geometric distribution, the optimal solution to 
the above problem is given by the Shiryaev test ifTOl . 

The robust versions of £[]), © and Q are relevant when one or both of the distributions vq and v\ are 
not known exactly, but are known to belong to uncertainty classes of distributions, Vq,Vi C V{X). The 
objective is to minimize the worst-case delay amongst all possible values of the unknown distributions, 
while satisfying the false-alarm constraint for all possible values of the unknown distributions. Thus the 
robust version of the Lorden criterion is to identify the stopping rule that solves the following optimization 
problem: 

min sup WDD(r) (4) 

^oG7 , o,^i6'Pi 

s.t. sup FAR(r) < a. 
Similarly, the robust version of the SRP criterion is: 

min sup J srp (t) (5) 

s.t. sup FAR(r) < a. 
and the robust version of the Bayesian criterion is: 

min sup ADD(r) (6) 

s.t. sup PFA(r) < a 

v ev 

The optimal stopping rule r under each of the robust criteria described above has the following minimax 
interpretation. For any other stopping rule r' that guarantees the false alarm constraint for all values of 
unknown distributions from the uncertainty classes, there is at least one pair of distributions such that 
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the delay obtained under r' will be at least as high as the maximum delay obtained with r over all pairs 
of distributions from the uncertainty classes. In the rest of this paper we provide solutions to the robust 
problems ©, (f5]) and (O when the uncertainty classes satisfy some specific conditions. 

III. Robust change detection 

A. Least Favorable Distributions 

The solution to the robust problem is simplified greatly if we can identify least favorable distributions 
(LFDs) from the uncertainty classes such that the solution to the robust problem is given by the solution 
to the non-robust problem designed with respect to the LFDs. LFDs were first identified for a simpler 
problem - the robust hypothesis testing problem - by Huber et al. in ifTTI and EOl . It was later shown 
in ETI that if the uncertainty classes satisfy a joint stochastic boundedness condition, one can identify 
these LFDs. Before we introduce this condition, we need the following notation. If X and X' are two 
real-valued random variables defined on a probability space (0, P) such that, 

P(X >t)> P(X' > t), for all t G M, 

then we say that the random variable X is stochastically larger than ET1 the random variable X'. We 
denote this relation via the notation X >~ X' . Equivalently if X ~ n and X' ~ //, we also denote fi >- fi'. 

Definition 1 (Joint Stochastic Boundedness) / |271/ : Consider the pair (Vo,Vi) of classes of distributions 
defined on a measurable space (X, J 7 ). Let (Vq,^) ^VqxVi be some pair of distributions from this pair 
of classes such that v_ x is absolutely continuous with respect to z7 . Let L* denote the log-likelihood ratio 
between v_ x and T>q defined as the logarithm of the Radon-Nikodym derivative log Corresponding 
to each Uj € Vj, we use [ij to denote the distribution of L*(X) when X ~ = 0, 1. Similarly we 
use JIq (respectively fj, ) to denote the distribution of L*(X) when X ~ Vq (respectively v_{). The pair 
{Vq,Vi) is said to be jointly stochastically bounded by (X'cidi) if for all (uq, ui) € Vq xVi, 

JI y fi and /xi y ^ ■ 

Loosely speaking, the LFD from one uncertainty class is the distribution that is nearest to the other 
uncertainty class. This notion can be made rigorous in terms of Kullback-Leibler divergence and other 
Ali-Silvey distances between distributions in the uncertainty classes, as shown in |[22l Corollary 1]. 

Huber and Strassen EUl have established a procedure to obtain robust solutions to the Neyman-Pearson 
hypothesis testing problem provided the uncertainty classes can be described in terms of 2-alternating 
capacities. As pointed out in Ell , any pair of uncertainty classes that can be described in terms of 
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2-alternating capacities also satisfy the joint stochastic boundedness (JSB) condition (see |[20l Theorem 
4.1]). This observation suggests that we can identify examples of uncertainty classes which satisfy the joint 
stochastic boundedness condition using the results in |[20l , ETTl . and |[23l . These include e-contamination 
classes, total variation neighborhoods, Prohorov distance neighborhoods, band classes, and p-point classes. 
In general it is difficult to identify the distributions Vq and u 1 . However, for e-contamination classes, total 
variation neighborhoods, and Levy metric neighborhoods, the method suggested in |[23l pp. 241-248] can 
be used to identify these distributions. 

We show that under certain assumptions on Vq and V\, the pair of distributions (Fo,j^i) are LFDs 
for the robust change detection problem in (0]), ® and ©. Thus the optimal stopping rules designed 
assuming known pre-change and post-change distributions Vq and u 1 , respectively, are optimal for the 
robust problems (0]), (f5]) and ©. We use Ej^ to denote the expectation operator and Pj^ to denote the 
probability law when the change happens at m and the pre-change and post-change distributions are /7 
and v_ x , respectively. 

We need the following straightforward result. For completeness we provide a proof in the appendix. 

Lemma III.l. Suppose {Ui : 1 < i < n} is a set of mutually independent random variables, and 
{V% : 1 < i < n} is another set of mutually independent random variables such that Ui >- Vi, 1 < i < n. 
Now let h : M n continuous real-valued function defined on W 1 that satisfies, 

h(x\ , . . . , , a, Xj+i , . . . , x n ) 

^ , . . . , , X{ , , . . . , X n ) , 

for all x™ £ M. n , a > xi, and i € {1, . . . , n}. Then we have, 

h(U 1 ,U 2 ,...,Un)yh(V 1 ,V 2 ,...,V n ) 

B. Lorden criterion 

When the distributions uq and v\ are known, the solution to dH) is given by the CUSUM test Q. The 
optimal stopping time is given by, 

n 

t c = inf{n > 1 : max V L v {X,j) > rj} (7) 

l<k<n — ' 
i=k 

where L v is the log-likelihood ratio between v\ and vq, and the threshold r\ is chosen so that, E^ (r c ) = -. 
The following theorem provides a solution to the robust Lorden problem when the distributions are 
unknown. 
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Theorem 111.2. Suppose the following conditions hold: 



(i) The uncertainty classes Vq,Vi are jointly stochastically bounded by (vo,Ki)- 



(ii) All distributions Vq G Vq are absolutely continuous with respect to vq. i.e., 



(8) 



(iii) The function L*{.), representing the log-likelihood ratio between an d is continuous over 
the support of Vq. 

Then the optimal stopping rule that solves (@) is given by the following CUSUM test: 



We prove the theorem in the appendix. Two brief remarks are in order. Firstly, the discussion in lfl4l 
p. 198] suggests that when LFDs exist under our formulation, they also solve the asymptotic problem, 
as expected. Secondly, the robust CUSUM test admits a simple recursive implementation similar to the 
ordinary CUSUM test. Clearly, 



where S n = maxK{.< n YH=k L*(Xi) is the test statistic appearing in (©. Thus it is easy to compute the 
test statistic recursively. 

1 ) Asymptotic analysis of the robust CUSUM: In general, for any pair of pre-change and post-change 
distributions (i>o, v\) from the uncertainty classes, we expect the performance of the robust CUSUM test 
to be poorer than that of the optimal CUSUM test designed with respect to the correct distributions. The 
drop in performance can be interpreted as the cost of robustness. Although it is not easy to characterize 
this cost in general, some insight can be obtained by performing an asymptotic analysis in the setting 
where the false alarm constraint a goes to zero. Our analysis uses the result of H Theorem 2] (also 
see lfl4l Theorem 6.16]). We use WDD"(r*) to denote the worst-case delay obtained by employing the 
stopping rule r* when the pre-change and post-change distributions are given by v$ and v\. Similarly, 
WDD*(t*) is used to denote the same quantity when the pre-change and post-change distributions are 
the LFDs. 

As mentioned in the remark following Theorem 2 in (H, we can interpret the robust CUSUM test as 
a repeated one-sided sequential probability ratio test (SPRT) between v_ x and vq. Let r SPRT denote the 




(9) 



where the threshold r/ is chosen so that, 



□ 



Sn+l — S n + L*(X n+ i). 



(10) 
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stopping rule of the SPRT. We apply (U Theorem 2] to r SPRT when the true distributions are the LFDs. 
It follows that 

E*oo(T*) > - 

a 

where B = — is used as the upper threshold in the SPRT given by r SPRT . From (l30l ). we know that 

E^(r*) > E^(t*) > i. 

We again apply the theorem to t sprt , but with the true distributions given by any vq S Vq and v\ GPi. 
We now have, 

WDD>*) < E(r SPRT ) 

where the expression on the right hand side denotes the expected stopping time of the SPRT when the 
observations follow distribution v\. Now, by applying the well-known Wald's identity E4l as suggested 
in the remark following 01 Theorem 2], we obtain 

I log a I 

E(t sprt ) = — (1 + o(l)j, asa^O 

where o(l) —¥ as a — > and 



I Vl = j L*(x)dvi{x) = D^Wvq) - D{v\\\v x ) 



Thus 

log(a)|(l + o(l)) 



WDD 1 ' (r* ) < 



•D(i/i||z/ ) - D{v\\\u^} ' 

It is also known from JH Theorem 3] that any stopping rule r that satisfies the false alarm constraint 

FAR(r) < a must satisfy the lower bound 

„, x |log(a)|(l + o(l)) 
WDD^(t) > 1 &v ny v 

and that this lower bound is achieved by the optimal CUSUM test between v\ and v$. Thus, the worst-case 

delay of the robust test is asymptotically larger by a factor no more than 

D{vi\\vg) 
^(^lll^o) - 

when compared with the delay incurred by the optimal test. This factor is thus an upper bound on the 
asymptotic cost of robustness. 
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C. Shiryaev-Roberts-Pollak (SRP) criterion 

The SRP stopping rule is asymptotically optimal for (f2]). Let Rq be a random variable with distribution 
tp supported on K + and define, 

R v n = L v {X n ){l + R v n _ 1 ), n>l. (11) 

When the distributions uq and v\ are known the SRP stopping rule is given by 

T^=inf{n>0:R»> V }. (12) 

Asymptotic optimality property: The SRP test of ([121 is asymptotically optimal for © in the following 
sense J5J : For every < a < 1 there exists threshold ?] and probability measure ip n such that the stopping 
rule r SRP := t^p'^" satisfies FAR(r SRP ) = a and for any other stopping rule r that satisfies the false 
alarm constraint FAR(r) < a, we have 

Jskp(t) > J SRP (r SRP )+o(l) (13) 

where o(l) —¥ as a — > 0. 

The following theorem identifies a stopping rule that extends the above asymptotic optimality property 
to the setting where the post-change distribution is unknown. 

Theorem III.3. Suppose the following conditions hold: 

(i) The uncertainty class Vq is a singleton Vq = {fo} an d tne P a i r (Po^'Pi) is jointly stochastically 
bounded by (vo,v_i). 

(ii) The function L*(.), representing the log-likelihood ratio between and vq is continuous over the 
support of Vq. 

Let r* RP := Ts H p v denote the SRP stopping rule defined with respect to the LFDs (^o^i). with 
parameters ij and ip^ chosen such that the asymptotic optimality property of diil ) is satisfied. Then the 
stopping rule r* RP is also asymptotically optimal for (O in the following sense: For every < a < 1 
and for any stopping rule r that satisfies the false alarm constraint FAR(t) < a, we have 

sup > sup J S R P (r s * ap ) + o(l) (14) 

where o(l) — > as a 0. □ 

The result of (fT4l can be interpreted as follows: The difference between the worst-case values of the 
delays incurred by the stopping rule r* RP and any other stopping rule r approaches zero as the false 
alarm constraint a approaches zero. 
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Our proof, provided in the appendix, is useful only when Vq is a singleton. It is possible that the 
asymptotic optimality result may still hold even for general Vo, although the current proof is not 
applicable. We elaborate on this further in the discussion in the next section on the Bayesian criterion, 
and also in the appendix following the proof of the theorem. 

We also note that in some cases our proof can be adapted to obtain tests that are exactly optimal for the 
robust SRP criterion of (f5]). Polunchenko et al. li25l study the Shiryaev-Roberts procedure (SR-r) which 
is identical to the SRP procedure described earlier, except for the fact that Ro is not random but fixed at 
some constant r. Theorem 2 of |[25l shows the exact non-asymptotic optimality of the SR-r procedure 
for detecting a change in distribution from Exp(l) to Exp(2) where Exp(#) refers to an exponential 
distribution with mean 9 . Using that result, the proof of Theorem IIII.3I can be adapted to obtain the 
exact robust solution to the optimization problem in (f5]). In particular it can be shown that the SR-r 
procedure for detecting change from Exp(l) to Exp(2) given in |25l Theorem 2] is also optimal for (f5]) 
when V = {Exp(l)} and Vi = {Exp(6>) : 9 > 2}. 

D. Bayesian criterion 

When the distributions vq and v\ are known and the prior distribution of the change-point is geometric, 
the solution to ([3]) is given by the Shiryaev test ifTOl . Denoting the parameter of the geometric distribution 
by p, we have, 

Tr k = p(l-p) k -\ k>l. 
The Shiryaev stopping rule is based on comparing the posterior probability of change to a threshold 7/ 

r s = inf {n > 1 : P^(A < n\T n ) > rj'} . 
It can be equivalently expressed as, 

{n n \ 

n > 1 : log( J] 7r k exp(J] V{Xi))) > rj \ (15) 
k=l i=k ) 

where the threshold rj is chosen such that PFA(r s ) = P i/ (r s < A) = a. The following theorem, proved 
in the appendix, identifies a solution to the robust Shiryaev problem ©. 

Theorem 111.4. Suppose the following conditions hold: 

(i) The uncertainty class Vq is a singleton Vq = {fo} and the pair {Vq,V\) is jointly stochastically 
bounded by (fo,i^i). 

(ii) The prior distribution of the change-point is a geometric distribution. 
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(iii) The function £*(•)» representing the log-likelihood ratio between v_y and vq, is continuous over 
the support of v$. 

Then the optimal stopping rule that solves (HJ) is given by the following Shiryaev test: 



We note that our results under the Bayesian and SRP criteria are applicable only when the pre-change 
distribution is known exactly and hence these results are weaker than our result under the Lorden criterion. 
Suppose Vo is not a singleton and (Vq,Vi) is jointly stochastically bounded by (pQ,v.i)- I n this case, the 
stopping rule r* defined with respect to (pQ,v_\) is not optimal for the robust Bayesian criterion ©. In 
particular, when the pre-change distribution is uq ^ Vq and the post-change distribution is v\ = v_ x , it can 
be shown that the average detection delay ADD u (t*) of the stopping rule r* is in general higher than 
the average detection delay ADD*(r*) when the pre-change and post-change distributions are (^0,^1). 
This is because the likelihood ratios of the pre-change observations appearing in (fl6l ) are stochastically 
larger under F than under u . This leads to a stopping time that is stochastically smaller under (v , z/ 1 ) 
than under {vq,v_i). Hence there is no reason to believe that r* solves the robust problem ©. 

Even in the case of the SRP criterion studied in Section IIII-CI our robust result holds only when Vq is 
a singleton and the JSB condition holds. However, unlike in the Bayesian case, we do not have a simple 
explanation for why the result cannot be extended to the setting where the pre-change distribution is not 
known exactly. It is possible that for some specific choices of the uncertainty classes, the stopping rule 
designed with respect to (^O)i^i) ma y be asymptotically optimal for the robust problem of ((5]), although 
we do not expect this to be true in general. 

However, such a problem does not arise for the robust CUSUM test we studied in Section ITlI-B I since 
the worst-case detection delay WDD"(r*) of the robust CUSUM depends only on the support of the 
pre-change distribution when post-change distribution is kept fixed at v\ =v_\. 

Comparison with robust sequential detection It is interesting to compare our results with some known 
results on robust sequential detection. We have shown that ptovided the JSB condition and other regularity 
conditions hold, change detection tests designed with respect to the LFDs exactly solve the minimax 
robust change detection problem under the Lorden and Bayesian criteria. However, the known minimax 
optimality results in robust sequential detection are all for the asymptotic settings - as error probabilities 
go to zero ifTTI or as the size of the uncertainty classes diminishes |[T2l . Huber ifTTll showed that an 
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where the threshold r/ is chosen so that P*(t* < A) = a. 



□ 



13 




Fig. 1. Comparison of robust and non-robust Shiryaev tests for a = 0.001 for the Gaussian mean shift example. 



exact minimax result does not hold for the robust sequential detection problem in general. He provided 
examples where the expected stopping times of the SPRT designed with respect to the LFDs are not least 
favorable under the LFDs. This is similar to the reason why the robust Shiryaev test is not optimal for 
the Bayesian problem when Vo is not a singleton as explained above. 

IV. Some examples and simulation results 

A. Gaussian mean shift 

Here we consider a simple example to illustrate the results. Assume vq is known to be a standard 
Gaussian distribution with mean zero and unit variance, so that Vo is a singleton. Let V\ be the collection 
of Gaussian distributions with means from the interval [0.1,3] and unit variance. 

V = {AA(0,1)} 

Vx = {M(0,l) -OE [0.1,3]} (17) 

It is easily verified that (Vo ,V\) is jointly stochastically bounded by (z7 , v_\ ) given by 

Vo ~ A/"(0,1), Ei ~ M(0.1,l). 

1) Bayesian criterion: We simulated the Bayesian and robust Bayesian change detection tests for this 
problem assuming a geometric prior distribution for the change -point with parameter 0.1 and a false 
alarm constraint of a = 0.001. From the performance curves plotted in Figure [Q we can see that the 
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robust Shiryaev test gives the same average detection delay (ADD) as the optimal Shiryaev test at v_y 
which corresponds to 9 = 0.1 in the figure. This is expected since the robust test is identical to the 
optimal test at u 1 . For all other values of v\ € "Pi, the performance of the robust test is strictly better 
than the performance at v_ x and hence this test is indeed minimax optimal. We also see in Figure Q] that 
the average delays obtained with the robust test are much higher than those obtained with the optimal 
test, especially at high values of the mean 9. The probability of false alarm and average detection were 
estimated via Monte-Carlo simulations with a standard deviation of 0.1% for the estimates. 

2) Lorden criterion and comparison with GLR test: Under the Lorden criterion, we compared the 
performances of three tests - the optimal CUSUM test with known 9, the robust CUSUM test designed 
with respect to the LFDs, and the CUSUM test based on the Generalized Likelihood Ratio (GLR test) 
suggested in H. The stopping time under the GLR test is given by 



n 




where rj is chosen so that the false alarm constraint is met with equality. The GLR test does not require 
knowledge of 9 but still achieves the same asymptotic performance as the optimal CUSUM test with 
known 9 when the false alarm constraint goes to zero for some choices of the uncertainty classes including 
the example considered above. 

Figure [2] and Table U shows estimates of the worst-case detection delay (WDD) obtained under the 
these tests designed for a false alarm constraint of a = 0.001, for various values of 9. These values are 
estimated using Monte-Carlo simulations. The delay values have a standard deviation lower than 1% and 
the false alarm value has a standard deviation lower than 3%. 

From the performance curves in Figure [2] and the values in Table U we see that the GLR test gives 
better performance than our robust solution at higher values of 9, and is close to optimal at these high 
values of 9. However, the robust test gives much better performance than the GLR test at the low values 
of 9. This is expected since the robust solution is minimax optimal and hence is expected to perform 
better at the unfavorable values of 9. 

An important difference between the two solutions is that although the robust CUSUM test based on 
the LFDs admits a simple recursive implementation like we described in (TTOb . the GLR test is in general 
very complex to implement. This is because the supremum in ( TT8T ) may be achieved at different values of 
v\ for different n. Furthermore, the optimization in (fT8l) may not be easy to solve for general uncertainty 
classes - particularly non-parametric classes like the e-uncertainty classes considered next. 
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Fig. 2. Comparison of various tests for false alarm rate of a — 0.001 for the Gaussian mean shift example. 

TABLE I 

Delays obtained using various tests under the Lorden criterion for a false alarm rate of a = 0.001. 



e 


Optimal CUSUM 


Robust CUSUM 


GLR test 


0.1 


242.7 


242.7 


496 


0.2 


111.5 


116.8 


184 


0.4 


43.2 


55.6 


57.2 


0.6 


23.5 


36.3 


28.6 


1.0 


10.5 


21.5 


12.35 



B. e-contamination classes 

We now discuss an example in which the uncertainty class Vq is no longer a singleton. For some scalar 
e G (0, 1), consider the following e-contamination classes: 



V = {u :vq = (1- e)/V(0, 1) + eH 0) H G 
Vx = {u 1 :v 1 = {l-e)M{l,l) + eH u H x € 



(19) 
(20) 



where "P(R) is the collection of all probability measures on R and Af(fi, a) denotes the probability 
measure corresponding to a Gaussian random variable with mean /i and variance a 2 . In other words, 
the distributions in uncertainty class Vi are mixtures of a Gaussian distribution with mean i and unit 
variance, and an arbitrary probability distribution on R with weights given by 1 — e and e respectively. 
Following the method outlined in ifTTI . we identified LFDs for these uncertainty classes and evaluated 
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the performance of the robust test. Let pi denote the density function of a J\f([i, 1) random variable and 
let qi denote the density function of the least favorable distribution from V%. It is established in JTlj that 
the densities of the LFDs have the following structure, 

(1 - e)p (x) if L(x) < b 



qo(x) 



^Pi(x) if L{x)>b 



(21) 



(1 — e)pi(x) if L(x) > a 
qi(x) = { { W{ 1 1 ' (22) 

a(l — e)po(x) if L(x) < a 

where L(x) = j^fy- The scalars a and b are identified by the following relation: 



(1 - e) / p (x)dx + ^—^ / pi(x)dx = 1 

J{x:L(x)<b} J{ X ;L{x)>b} 

(1 — e) / pi(x)dx + a(l — e) / po(x)dx = 1. 

J {x:L(x)>a} J {x:L(x)<a} 



In order to compare the performance of the robust test with that of the optimal test we chose the 
following distributions for Hq and H\\ 

H =M(0,a ),a o E [0.1,10] ffi = M(l, ai), ax 6 [0.1,10]. 

Table JI] shows the values of the worst-case delay (WDD) obtained when o"o is kept fixed at <7o = 1 
and <7i is varied. Shown are the results obtained using the robust CUSUM test as well as the optimal 
CUSUM test for e = 0.05 and for e = 0.005. We notice that the difference in performance between the 
robust test and the optimal test is larger for larger values of e. This matches the intuition that the cost 
of robustness would be higher for a larger uncertainty class of distributions. The delay values and false 
alarm rates were estimated to have standard deviations lower than 0.1% and 1% respectively. 

Table [III] shows the values of worst-case delay obtained under the optimal CUSUM tests when o\ is 
kept fixed at a\ = 1 and o"o is varied. The delay values and false alarm rates were estimated to have 
standard deviations lower than 0.1% and 1% respectively. We have not included the delays obtained 
under the robust test, since the delay of the robust test is invariant with oq. The delay obtained under 
the robust test for e = 0.05 and e = 0.005 are respectively 15.09 and 11.27 as shown in the third row of 
Table HTl corresponding to o\ = 1. 

V. Conclusion 

We have shown that for uncertainty classes that satisfy some specific conditions, the optimal change 
detectors designed for the least favorable distributions are optimal in a minimax sense. This is shown for 
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TABLE II 

Delays obtained using various tests under the Lorden criterion for e-uncertainty classes with 

a = 0.001 AND ctq = 1. 





e 


0.05 


e = 


0.005 


Robust CUSUM 


Optimal CUSUM 


Robust CUSUM 


Optimal CUSUM 


0.1 


14.77 


9.17 


11.27 


10.38 


0.5 


14.86 


9.12 


11.27 


10.39 


1 


15.09 


9.08 


11.27 


10.35 


5 


15.52 


8.78 


11.29 


10.33 


10 


15.59 


8.65 


11.29 


10.34 



TABLE III 

Delays obtained using the optimal CUSUM test for ^uncertainty classes with a = 0.001 and ai = 1. 



O~0 


Optimal CUSUM for e = 0.05 


Optimal CUSUM for e = 0.005 


0.1 


10.56 


10.55 


0.5 


10.50 


10.52 


1 


10.44 


10.56 


5 


10.02 


10.58 


10 


9.85 


10.59 



the Lorden criterion, the Shiryaev-Roberts-Pollak criterion, and Shiryaev's Bayesian criterion. However, 
robustness comes at a potential cost. The optimal stopping rule designed for the LFDs may perform 
quite sub-optimally for other distributions from the uncertainty class when compared with the optimal 
performance that can be obtained in the case where these distributions are known exactly. Using an 
asymptotic analysis, we have also obtained an analytic upper bound on this cost of robustness for the 
robust solution under the Lorden criterion. Nevertheless for some parameter ranges our robust test obtains 
significant performance improvement over the CUSUM test designed for the Generalized Likelihood Ratio 
statistic, which is a benchmark for the composite quickest change detection problem. Our robust solution 
also has the added advantage that it can be implemented in a simple recursive manner, while the GLR 
test does not admit a recursive solution in general, and may require the solution to a complex non-convex 
optimization problem at every time instant. 



June 2, 2010 



DRAFT 



18 

Appendix 

A. Proof of Lemma \III.1\ 

We prove this claim by induction. For n = 1, the claim holds because if h : R •->• R is a non-decreasing 
continuous function we have, 

P(/i(*7i) > t) = P(E/i > sup{x : h(x) < t} 
> P(Vi > sup{x : h(x) < t} 
= P(h(Vi) > t). 

Assume the claim is true for n = N and now consider n = N + 1. For any fixed 6 1^, since the 
function h is non-decreasing in each of its components, it follows by the proof for n = 1 that, 

h(x 1 ,x 2 ,...,x N ,U N+1 ) >- h(x 1 ,x 2 ,...,x N ,VN+i)- (23) 

We further have, 

P(h(U 1 ,U 2 ,...,U N+1 )>t) 

fu?{ x l)P(K x li x 2, ■ ■ ■ ^NiUn+x) > t)dxf 

f u p(x?)P(h(x 1 ,x 2i ...,x N ,V N+1 )>t)dx? (24) 
P(h(U 1 ,U 2 ,...,U N ,V N+1 )>t) (25) 
fv N+1 (y)P(h(Ui, U 2 ,..., U N , y) > t)dy 

> J fv N+ MP(HVi,V 2 ,...,V N ,y) >t)dy (26) 
= P(h(V 1 ,V 2 ,...,V N+1 )>t). 



> 



where (1241 ) is obtained via (1231 . The variables XJ{ appearing in (1251 ) are random variables with exact 
same statistics as Ui and independent of V^'s. The inequality of (1261) is obtained by using the induction 
hypothesis for n = N. Thus we have shown that, 

h(u u u 2 ,..., u N+1 ) y h{v u v 2 ,..., v N+1 ) 

which proves the lemma by the principle of mathematical induction. □ 
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B. Proof of Theorem Mil. 2 1 

Proof: Suppose Vo and V\ satisfy the conditions of the theorem. Since the CUSUM test is optimal 
for known distributions, it is clear that the test given in (O is optimal when the pre- and post-change 
distributions are V and v_ x , respectively. Hence, it suffices to show that the values of WDD(r') and 
FAR(t*) obtained under any v§ € Vo and any v\ €"Pi, are no higher than their respective values when the 
pre- and post-change distributions are I7 and u 1 . We use Y* to denote the random variable L*(Xj) when 
the pre-change and post-change distributions of the observations from the sequence {Xi : i = 1, 2, . . .} 
are V$ and v x , respectively, and Y? to denote the random variable L*(Xj) when the pre- and post-change 
distributions are vq and v\, respectively. We first prove the theorem for a special case. 

Case 1: Vo is a singleton given by Vo = {^o}- 

Clearly, in this case z7 = z/ and ([8]) is met trivially. Furthermore, in this case, the false alarm constraint 
is also met trivially since the false alarm rate obtained by using the stopping rule r* is independent of 
the true value of the post-change distribution. Fix the change-point to be A. Now, to complete the proof 
for the scenario where Vq is a singleton, we will show that for all A > 1, 

E* x [(r* - A + 1) + |Ja-i] y E<a(r* - A + l) + |^ A -i] (27) 

which will establish that the value of WDD(r*), obtained under any v\ &V\, is no higher than the value 
when the true post-change distribution is u_ l . 

Since we now have z7o = vq, both Y* and Y" have the same distributions for i < A and hence 
we assume without loss of generality that for all i < A, Y* = Y" with probability one. Under this 
assumption, we will show that for all integers N > 0, the following relation holds with probability one, 

P* x ((t*-X + 1)+<N\T x ^) 

(28) 

< P5 ;(( T *-A + i) + <iV|j- A -i), 

which will then establish (T27T ). Since r* is a stopping time, the event {(r* — A + 1) + < 0} is JF\-\- 
measurable. Hence, with probability one, ( |28T ) holds with equality for N = 0. Now it suffices to verify 
(l28l) for N > 1. We know by the stochastic ordering condition on V\ that, 

Yy y Y*, for alii > A (29) 
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Now we have the following equivalence between two events: 

{r* < N} = I max max V L*OQ) > 77 1 

)l<n<N l<k<n ^ V J 

= I max VL*OQ) > 77 1 . 

\l<k<n<N j-^ V ^ ~ ' J 

It is easy to see that the function, 

n 

f(xi,...,x N )= max y^Xj 

i=k 

is continuous and non-decreasing in each of its components as required by Lemma MI. 1 1 Hence for 
N > 1, the following hold with probability one: 

P*((r* - A + 1)+ <N\T X -i) 

= P* x (t* < N + X - l\T X -i) 

= P x (f(Y 1 *,...,Y£ +x _ 1 )>r l \Tx-i) 

< P x (f(Y 1 »,...,YZ +x _ l )>r ) \T x _ 1 ) 

= P^(t* < N\T X _ X ) 

= p-((r*-X + l)+ <N\^) 
where the inequality follows from Lemma llll.ll and (|29l ), using the fact that / is a non-decreasing function 
with respect to its last N arguments and the fact that Y? = Y* for i < A. Thus, for all integers N > 0, 
(l28l) holds with probability one and hence (l27l) is satisfied. This proves the result for the case where Vq 
is a singleton. 

Case 2: Vo is any class of distributions satisfying ([8]). 

Suppose that the change does not occur. Then we know by the stochastic ordering condition on Vq 
that, Y* y Yf for all i. It follows by Lemma ITTT7T1 that, 

> Poo(/(n",--..^)>^) 

Since the above relation holds for all N > 1, we have 

E^(r*) > E^(r*) = 1 (30) 
a 

and hence the value of FAR(r*) is no higher than a for all values of uq S Vq and v\ £ V\. 
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Now suppose the change-point is fixed at A. A useful observation is that for any given stopping 

rule r and fixed post-change distribution v\, the random variable E a 0,1/1 [(t — A + l) -1- I-^a 1] is a fixed 

deterministic function of the random observations {X-y, ■ ■ ■ , X\_i), irrespective of the distribution v . 
Thus the essential supremum of this random variable depends only on the support of v$. Applying this 
observation to the stopping rule r*, and using the relation ([8]), we have for all v§ ^V§,v\ ^T\, 
ess sup E^K^-A + l^lJ^x] 

< ess sup E^[(r*-A + l) + |^ A _i]. 
We also know from Case I above that for all v\ 6 V\, 

ess sup E^ 1 [(r*-A + l) + |J\_i] 

< esssupE£[(T*-A + l) + |.FA_i]. 
Taking the supremum over A > 1, it follows from the above two relations that the value of WDD(r*) 

under any pair of distributions (i/ , V\) £ Vq x V\ is no larger than that under (F , i^). Thus r* solves 

the robust problem (0]). ■ 

C. Proof of Theorem \III.3\ 

Proof: Let t* rp := r^ RP ' ? '^' J denote the SRP stopping rule defined with respect to the LFDs (uq,^) 
satisfying the asymptotic optimality property of (TT3T > as mentioned in the statement of the theorem. It is 
easy to see that for any integers A > 1 and N > 1, we have 

P u x (r: RP -X<N\r: RP >X,R* =r) 

P^{T* nr -\<N}n{T* BP >X}\R* = r) 
P^(r* RP > \\R* = r) 

where Rq denotes the random variable with distribution ip TI used for initializing the iteration in (fTTI ). We 
follow the same steps as in the proof of Theorem IIII.2I Let Y? denote the random variable L*{Xi) when 
the pre-change and post-change distributions are vq and v\ respectively. Since r* RP is a stopping time the 
event {r* RP > A} is measurable with respect to the pre-change observations and hence we can represent 
this event as, 

K RP >A} = {(Y^Y 2 V..,Y A %)€T} 

where T is the set of pre-change trajectories corresponding to the event {r* RP > A}. Hence, for any 
r € R_|_, we can express the conditional probability as 

PUrL P -\<N\r: up >\,R*=r) 

f T P\(f t (Y£ ,Y£ +1 , . . . , Y» +N ) > g(t,N, V )\R* = r)P A ((Y^, Yf, . . . , Y A %) £ dt\R* = r) ( 31 ) 

/ T p A ((y 1 -,y 2 ",...,y^_ 1 )€dt|/2S = r) 
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such that for all t G T, the function f t : W N ~ X+1 h-> R satisfies the requirements of Lemma MI. 1 1 and 
g(t, N, rj) is some real-valued function. The exact form of function ft can be obtained from the iterations 
of (fTTb used to define the SRP stopping rule of (fi2l . We note that in (13TT ) the post-change distribution 
v\ affects only the first term under the integral in the numerator. Thus, it follows by applying Lemma 
mm that 

Pl(r: RP -X<N\r: RP >X,R*=r) 

(32) 

<P^« RP -A<iV|T s ; p >A, J RS = r) 
for all v\ € "Pi. Hence it further follows that, 

sup E^(r* RP - A|r* RP > A) = E A (r* RP - A|r* RP > A). (33) 

We also observe that for any stopping rule r that satisfies the false alarm constraint FAR(r) < a, we 
have, 

sup sup E^(r — A|r > A) 

v x ^V x AM 

> sup E* x (t - X\t > A) 

A>1 

>su P E^(< rp -A|t s * rp >A) + o(1) 
A>1 

= sup sup E$;(7-* RP - A|r* RP > A) + o(l) 
uieVi a>i 

where the second relation follows from the fact that r* RP satisfies the asymptotic optimality of ([T"3l when 
the true post-change distribution is v_ x , and the last equality follows from d33l . This completes the proof 
of the theorem. ■ 
We note that if the robust SRP stopping rule r* RP is used when Vq is not a singleton, the crucial step 
of (l32l does not hold for vq ^ Vq and v\ =v±. Thus our proof of optimality of the robust SRP stopping 
rule does not hold when the pre-change distribution is unknown. 

D. Proof of Theorem \TlL4\ 

Proof: The proof is very similar to that of Case 1 in Theorem lHI.21 Since the Shiryaev test is optimal 
for known distributions, it is clear that the test given in ( fT6l ) is optimal under the Bayesian criterion when 
the post-change distribution is v_ l . Also from the definition of PFA(r*) it is clear that the probability of 
false alarm depends only on the pre-change distribution and hence the constraint in © is met by the 
stopping time r*. Hence, it suffices to show that the value of ADD(r*) obtained under any v\ EVi, is 
no higher than the value when the true post-change distribution is v^. 
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Let us first fix A = A. We know by the stochastic ordering condition that conditioned on A = A, for 
alH > A, we have >~ Y* where Y* and Y" are as defined in the proof of Theorem IIII.2I As before, 
the function, 

(n n \ 

yV/ c exp(V'x;) 
z — ' z — ' / 
k=l i=k / 

is continuous and non-decreasing in each of its components as required by Lemma IIII.ll Using these 
facts, we can show the following by proceeding exactly as in the proof of Theorem IHI.2t Conditioned 
on A = A, 

E*((r* - A)+|J- A _!) y E^(« - X)+\T X - 1 ). 
Thus, we have E^((t* — A) + ) > E^((r* — A) + ) and by averaging over A, we get, 

E*(«-A)+)>E"((r*-A)+). 
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